26 research outputs found

    Using the Output Embedding to Improve Language Models

    Full text link
    We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.Comment: To appear in EACL 201

    How Language Model Hallucinations Can Snowball

    Full text link
    A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously generated hallucinations, LMs output false claims that they can separately recognize as incorrect. We construct three question-answering datasets where ChatGPT and GPT-4 often state an incorrect answer and offer an explanation with at least one incorrect claim. Crucially, we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes, respectively. We refer to this phenomenon as hallucination snowballing: an LM over-commits to early mistakes, leading to more mistakes that it otherwise would not make

    Measuring and Narrowing the Compositionality Gap in Language Models

    Full text link
    We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of compositional reasoning. We then demonstrate how elicitive prompting (such as chain of thought) narrows the compositionality gap by reasoning explicitly instead of implicitly. We present a new method, self-ask, that further improves on chain of thought. In our method, the model explicitly asks itself (and then answers) follow-up questions before answering the initial question. We finally show that self-ask's structured prompting lets us easily plug in a search engine to answer the follow-up questions, which additionally improves accuracy

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Full text link
    Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We consider real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. We therefore introduce SWE-bench, an evaluation framework including 2,2942,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 1212 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and GPT-4 solve a mere 4.84.8% and 1.71.7% of instances respectively, even when provided with an oracle retriever. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.Comment: Data, code, and leaderboard are available at https://www.swebench.co

    Architecture of Planetary Systems Based on Kepler Data: Number of Planets and Coplanarity

    Full text link
    We investigated the underlying architecture of planetary systems by deriving the distribution of planet multiplicity (number of planets) and the distribution of orbital inclinations based on the sample of planet candidates discovered by the Kepler mission. The scope of our study included solar-like stars and planets with orbital periods less than 200 days and with radii between 1.5 and 30 Earth radii, and was based on Kepler planet candidates detected during Quarters 1 through 6. We created models of planetary systems with different distributions of planet multiplicity and inclinations, simulated observations of these systems by Kepler, and compared the properties of the transits of detectable objects to actual Kepler planet detections. Specifically, we compared with both the Kepler sample's transit numbers and normalized transit duration ratios in order to determine each model's goodness-of-fit. We did not include any constraints from radial velocity surveys. Based on our best-fit models, 75-80% of planetary systems have 1 or 2 planets with orbital periods less than 200 days. In addition, over 85% of planets have orbital inclinations less than 3 degrees (relative to a common reference plane). This high degree of coplanarity is comparable to that seen in our Solar System. These results have implications for planet formation and evolution theories. Low inclinations are consistent with planets forming in a protoplanetary disk, followed by evolution without significant and lasting perturbations from other bodies capable of increasing inclinations.Comment: 16 pages, 7 figures, accepted to Ap

    The multi-configurational time-dependent Hartree method for bosons: Many-body dynamics of bosonic systems

    Full text link
    The evolution of Bose-Einstein condensates is amply described by the time-dependent Gross-Pitaevskii mean-field theory which assumes all bosons to reside in a single time-dependent one-particle state throughout the propagation process. In this work, we go beyond mean-field and develop an essentially-exact many-body theory for the propagation of the time-dependent Schr\"odinger equation of NN interacting identical bosons. In our theory, the time-dependent many-boson wavefunction is written as a sum of permanents assembled from orthogonal one-particle functions, or orbitals, where {\it both} the expansion coefficients {\it and} the permanents (orbitals) themselves are {\it time-dependent} and fully determined according to a standard time-dependent variational principle. By employing either the usual Lagrangian formulation or the Dirac-Frenkel variational principle we arrive at two sets of coupled equations-of-motion, one for the orbitals and one for the expansion coefficients. The first set comprises of first-order differential equations in time and non-linear integro-differential equations in position space, whereas the second set consists of first-order differential equations with time-dependent coefficients. We call our theory multi-configurational time-dependent Hartree for bosons, or MCTDHB(MM), where MM specifies the number of time-dependent orbitals used to construct the permanents. Numerical implementation of the theory is reported and illustrative numerical examples of many-body dynamics of trapped Bose-Einstein condensates are provided and discussed.Comment: 30 pages, 2 figure

    Time-dependent multi-orbital mean-field for fragmented Bose-Einstein condensates

    Full text link
    The evolution of Bose-Einstein condensates is usually described by the famous time-dependent Gross-Pitaevskii equation, which assumes all bosons to reside in a single time-dependent orbital. In the present work we address the evolution of fragmented condensates, for which two (or more) orbitals are occupied, and derive a corresponding time-dependent multi-orbital mean-field theory. We call our theory TDMF(nn), where nn stands for the number of evolving fragments. Working equations for a general two-body interaction between the bosons are explicitly presented along with an illustrative numerical example.Comment: 16 pages, 1 figur
    corecore